Dynamical phase transition due to preferential cluster growth of collective emotions 

in online communities 
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We consider a preferential cluster growth in a one-dimensional stochastic model describing the 
dynamics of a binary chain with long-range memory. The model is driven by data corresponding 
to emotional patterns observed during online communities' discussions. The system undergoes a 
dynamical phase transition. For low values of the preference exponent, both states are observed 
during the string evolution in the majority of simulated discussion threads. When the exponent 
crosses a critical value, in the majority of threads an ordered phase emerges, i.e. from a certain time 
moment only one state is represented. The transition becomes discontinuous in the thermodynamical 
limit when the discussions are infinitely long and even an infinitely small preference exponent leads 
to the ordering behavior in every discussion thread. Numerical simulations are in a good agreement 
with approximated analytical formula. 
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I. INTRODUCTION 

It is well known (see e.g. [l|) that a one-dimensional 
(ID) system with short-range forces cannot undergo a 
phase transition at a nonzero temperature. The situation 
changes when the interaction range increases, e.g., the 
Ising chain displays a second order phase transition when 
spin interactions decay with the distance r as r~( 1+cr ) for 
a < 1 and non-standard critical exponents are observed 
for 0.5 < a < 1 Q. Another example is the ID long- 
range g-states Potts model in which, depending on the a 
exponent and g-parameter, a first-order or a second-order 
phase transition is possible Q. 

Some properties of ID spatial systems with long-range 
interactions can be mapped to N - step (long memory) 
Markov chains where transitional probabilities depend 
on a system history. Analytical and numerical solutions 
for the resulting time-dependent probability distributions 
were presented in for fixed values of the time horizon 
N . The formalism was extended in 0, @] to an infinite- 
range memory that covers the whole history of a ID ran- 
dom walker. In such dynamical phase transition 
takes place from the normal diffusion to a super-diffusive 
behavior. When the parameter describing the memory 
influence is small enough, the variance Dt, of a walker 
position scales with the walking time L as Dr, ~ L. It in- 
creases however as Dl ~ L K , k > 1 when the memory in- 
fluence parameter crosses a critical value. The results can 
explain the long-term behavior of coarse-grained DNA 
sequences, written texts and financial data [fSj]. 

In this work, we consider a stochastic ID model of 
preferential cluster growth where a special form of long- 
memory dynamics follows from recent observations of 
emotional patterns in online communities discussions |8|- 
[l3j . In fact, complex phenomena taking place during the 
information search and communication exchange over the 



Internet have been investigated by several authors using 
diverse methods of statistical physics, see e.g. [l4T - [2C)l |. 
The studies are facilitated by an easy access to massive 
data sources [2l|, [22[ . Information and opinion diffusion 
in online communities is frequently compared to epidemi- 
ological phenomena [23-29]. Both processes, however, 
need separate approaches, what was shown e.g. in re- 
cent investigations [30L l31j of social contagion in online 
social networks that emerged during a political protest 
in Spain. 

Our model is based on a special collective phenomenon 
of emotional interactions reported in [TTJ] . Consecutive 
comments posted on blogs, the BBC Forum, IRC chan- 
nels and the Digg website when represented by binary 
variables corresponding to posts' emotional valencies |32| - 
l3~j ] tend to group in clusters of a similar valence and the 
cluster growth rate can be well described by a sub-linear 
preferential rule [ll|. It follows a negative comment is 
more likely posted after a sequence of five negative mes- 
sages than after four such posts. The persistent dynamics 
of this system has been confirmed by the Hurst exponent 
analysis in [To| . The aim of this paper is to study the 
global behavior of this system from the point of view of 
dynamical phase transitions. We will investigate when 
during the course of time the process of preferential clus- 
ter growth leads to the emergence of a critical cluster 
that is followed by posts displaying always the same va- 
lence and what a fraction is of such an ordered phase in 
all posts. 

This paper is organized as follows. In Sec. [TT] we de- 
scribe observations of emotional clusters in massive data 
sets, in Sec. IIIII we define a data-driven model for posts 
appearance and in Sec. IIVI we present numerical sim- 
ulations showing a transition between a mostly disor- 
dered (hetero-emotional) and a mostly ordered (mono- 
emotional) phase in a two-state case of such a model. 
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The model extension to a three-state system is studied 
in Sec. El and in Sec. [VI] we compare critical model 
parameters to data from selected online communities. 



II. PREFERENTIAL GROWTH OF 
EMOTIONAL CLUSTERS 

According to the behavior found in several online com- 
munities (BBC Forumpl EH, Digg, IRC, blog data) 
and presented in [ill . Il3|. the preferential growth mech- 
anism is the main process responsible for forming emo- 
tional clusters. It is manifested by the power-law formula 
for conditional probability p(e\ne ) that after n comments 
with the same emotion e j32H34j the next comment will 
express a similar sentiment. The data (see Fig. [TJ re- 
veals the relation p(e\ne) = p(e\e)n a where p(e\e) is the 
conditional probability that two consecutive messages 
have the same emotion e = —1,0,1 (negative, neutral, 
positive). For the description of automatic sentiment 
analysis applied for the data retrieval see [n], [37l - l39| . 
The characteristic exponent a represents the strength of 
the preferential process leading to the long-range attrac- 
tion between posts of the same emotion. The proba- 
bility of finding the cluster of size n is proportional to 
the factor C = p(e)p(e\e) n ~ 1 [(n — 1)!]" responsible for 
appearance of the sequence of n consecutive messages. 
It should be also taken into account that the cluster of 
size n is defined as exactly n posts with mono-emotional 
expressions. Thus, to get the cluster distribution func- 
tion one multiplies the factor C by probabilities 1 —p(e), 
1— p(e\e)n a corresponding to events that before and after 
the cluster users write comments with emotional states 
different from e. The analytical form of the normaliza- 
tion factor can be obtained only as an approximation. 
As a result, the distribution of the emotional clusters is 
represented by the function: 

P e (n) « p(e|e)"- 1 [(n - 1)!] Q [1 - p(e\e)n a ] (1) 

dependent on only two parameters a and p(e\e). 

III. MODEL DESCRIPTION 

Here we try to simulate the process of preferential clus- 
ter growth in an artificial environment. To make the 
problem simpler, we consider a two-state system, so only 
positive e = 1 or negative e = — 1 messages can appear 
in this artificial discussion. Each thread has the same 
length L, not as in real data, where the thread distri- 
bution was close to a power-law relation (see Supporting 
Material in and [iaj). 

The evolution rules of this two-state system are as fol- 
lows: 

• the emotion in the first message is randomly chosen 
with even probabilities pie = 1) = pie = —1) = 1/2 




FIG. 1. (Color online) The conditional probability p(e|ne) of 
the next comment occurring with the same emotion e for Digg, 
BBC, blogs and IRC data [0, Ql|. Symbols are data (blue 
triangles, red circles and white squares, for negative, positive 
and neutral clusters, respectively), and lines represent the fit 
to the preferential attraction relation p(e\ne) = p(e\e)n a 



the probability of emotion e in the next message is 
dependent on the discussion history. Information 
about this history is coded in size n of the recently 
observed emotional cluster. The cluster of size n is 
defined as a sub-chain of the length n of consecutive 



states with the same values as the valencies e [11 1 



The process of the cluster growth is based on the 
behavior observed in real data. The conditional 
probability that the cluster containing n consecu- 
tive messages with the same valency e increases its 
length to n + 1 is given by the equation: 



p(e\ne) = x e n c 



(2) 



where x e is a constant dependent on the cluster 
valency e (it amplifies the cluster growth, and is 
equivalent top(e|e)) while the exponent < a e < 1 
describes a strength of interactions for the emotion 
e. In the numerical simulation in each time step 
we randomly choose a value between [0; 1]. If it is 
smaller than p e (n), then the cluster of the emotion 
e is continued; otherwise, the cluster is terminated, 
and the opposite emotion (— e) appears. 

• if p e (n) = 1, then the cluster reaches its critical size 
n c , which means that starting from this moment 
the discussion will be permanently ordered and 
all next messages in this thread will possess the 
same emotion e. 

One can define T c as the time when the cluster of the 
critical size n c appears. The (T c ) is the average over R 
realizations (threads); in almost all cases we use R = 10 4 . 
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FIG. 2. (Color online) Numerical simulations of cluster distri- 
bution for x = 0.5, a = 0.09 (the same value of a for positive 
and negative emotions) L — 2 x 10 . The line corresponds to 
EqUJ 




FIG. 3. (Color online) Left: time T c needed for the emergence 
of the critical cluster for x = 0.5, L = 10 7 . Right: size of 
critical cluster as a function of a for x = 0.66 (blue), x — 0.5 
(black) and x = 0.33 (red) (from bottom to top). 



IV. TWO-STATE SYSTEM 

If it is not otherwise stated we shall consider the sim- 
plest case x — x\ — x-\ — 0.5 and a_i = a.\ = a. The 
the probabilities of both emotions when calculated in an 
unordered phase (before the critical cluster occurrence) 
are the same p(— 1) = p(l) = 0.5, and the distribution 
of the observed cluster lengths is very similar to the one 
observed in the real data. In Fig. ® we present the clus- 
ter distribution in artificial threads. The line comes from 
the theoretical prediction based on preferential cluster 
growth, Eq. [TJ 

After the transition time T c , i.e, when the critical 
cluster appears, the discussion changes to the mono- 
emotional thread (MET). Starting from this moment, the 
probabilities p(—l) andp(l) become and 1 (or 1 and 0). 
This means that half of the threads are nearly whole pos- 
itive, and half are nearly whole negative (if the threads 
are long enough). It is obvious that the average critical 
time (T c ) should depend on the strength of emotional in- 
teractions, i.e, on the exponent a. It is also obvious that 
(T c ) has to be larger or equal to the critical size of the 
cluster (T c ) > n c (see Fig. [3]). Values of (T c ) are received 
from numerical simulations and n c from Eq. [2] 



Since for some threads the critical cluster is not ob- 
served at all, (T c ) is not an appropriate observable, and 
a more convenient variable is a mean inverse of the crit- 
ical time 



(A) 



1 



1 



K i=l ±c 



(3) 



where R is the number of threads that were ordered 
during the simulation, which means that their critical 
times were smaller than the thread length. In Fig. Uwe 
present a relation between (A) and a. The left plot is in 
the linear scale and clearly displays the staircase shape 
of this dependence that follows from the integer values 
of T c (compare Fig. [3]). The right plot presents in the 
log-linear scale a rapid decrease in (A) for a « 0.15. The 
multi-steps shape for a > 0.3 and a rapid decrease ob- 
served for 0.13 < a < 0.2 are only weakly dependent on 
the system size L. We tested this behavior for different 
values of L; for clarity, we show only representative sim- 
ulations for L = 10 6 , L = 2 x 10 7 and L = 5 x 10 7 . Of 
course, the length of the thread L influences the value 
a when the order is observed for the first time. It is 
a = 0.13 for a system of the size L = 5 x 10 7 and a = 0.15 
when L = 10 3 . 

Probability P c that a certain post starts a critical clus- 
ter can be estimated under the assumption that in a sin- 
gle discussion thread only one critical cluster can appear 



(A) 



1 



(4) 



However, the probability of finding a cluster with the 
critical size can be described by a relation similar to one 
presented in 11] : 



P c = P(n c ) = A(x, a)x n ° [(n c - 1)!]", 



where n c = 2~ is the size of the critical cluster. There 
is a difference between Eq. [5] and an analytic calculation 
presented in [Tl| (see also remarks in Sec. |TTJ) since here 
we consider the beginning and not the end of the critical 
cluster. 

The normalization constant in Eq. [5] 



A(x,a) 



n—n c 

E 



z n [(n-l)!] 



(6) 



was calculated numerically and is presented in Fig. [5] 
Since the upper limit in the above sum is n c , this nor- 
malization constant is different from that in Eq. [T] For 
a<lwe get 

A(x,a) w x/(l -x). (7) 
Combining Eqs. ©-([5]), together we receive 



(A(x, a)) = A(x,a) x 1 



1 ! 



(8) 
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FIG. 4. (Color online) Relation between the inverse of the critical time < A > and the exponent of affective interactions a 
for x = 0.5 for different values of discussions lengths L. Red circles: L — 10 7 , blue squares: L — 2 x 10 7 , sky-blue triangles: 
L — 5 x 10 7 . Black circles follow from Eq. [8] and are very close to the red line from Ea llOl 




0,4 0,6 

a 



FIG. 5. Values of A(a) estimated for x — 0.5 using Mathe- 
matica. 



that well fits to the behavior of (A(a)) received from 
the numerical simulations (see the right panel in Fig. [4}. 
The value of (A(q = 1)} is not obtained from Eq. [5] but 
may be easily calculated from a simple branching process 
as: 



n=n " (\\ n 1 

X(x = 0.5, a) = 2 I -= 21n2 - 1 = 0.386294 



In the limit aCl Eq. [8] reduces to 



(\{x,a)) 



1 



■ exp 



(9) 



(10) 



and we get (X(x, 0)) = 

Let us consider a discussion in thread of length L with 
affective interactions described by the characteristic ex- 
ponent a and let us define a fraction of discussions that 
are mono-emotional ordered (MET) from a certain mo- 
ment in such a thread as r(a, L) = ^. This value is also 



a probability of the MET occurrence before time t = L. 
It follows the value of r can be written as 



r(a, x, L) = 1 — [1 — X(a, x)] 1 



(11) 



where the explicit form can be received by inserting into 
(fTTj) Eqs. [7] and [SI In the limit a « 1 we get from flTUJl 



r(a, x, L) = 1 — 



1 - 



1 



■ exp 



(12) 



Results of numerical simulations and theory from Eq. 
[12] are presented in Fig. [6l As one could expect a frac- 
tion r of the MET phase in all threads increases with the 
a exponent and with the thread length L. Moreover for 
longer threads the agreement between Eq. [12] and numer- 
ical simulations is better and the transition between the 
states r « and r « 1 becomes steeper. In the thermo- 
dynamical limit L — > oo this transition is discontinuous 
since 



and 



lim r{a — 0, x, L) =0 



lim r(a > 0, x, L) = 1 



(13) 



(14) 



Let us define the critical value of the interaction 
strength as a c = a(r = 0.5). After a short algebra we 
get from (JE) 



1 



1 



■ exp 



,(-l/«c) 



= 2-^) (15) 



For the symmetrical case x = 1/2 and L ^> 1 (if it is not 
otherwise written we shall use these assumptions further 
) we get a simpler relation 



a c 2 (1 / a ^ wln(X) -ln[21n(2)] 
that can be disentangled as: 

ln(2) 



W_i(ln(2)/l n (i/ln(4))) 



(16) 



(17) 
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FIG. 6. (Color online) Fraction of ordered threads as a func- (violet rombs) and the slope of tan (red circles) on the sys- 
tion of the exponent a for various thread lengths L. Lines tem siz e L. Solid line corresponds to Eq. [17] and a dashed 
correspond to Eq. [121 one to Eq. [19] where the value of a c was taken from the Eq. 

d 



where W_i(.) is the lower branch of Lambert W- 
function [|oj]. A quantitative measure of the system be- 
havior near the transition point a c is the slope 



tan <b — 



dr(a, x, L) 



da 



that can be expressed as 



tan < 



i 



ln(a;) 



(18) 



(19) 



For x = 1/2 Eqll9l can be written as an explicit function 
of the length L using the result ([T7|. Relations (fl~7|) and 
(|L9| are presented at Fig[7] where we see good fit to cor- 
responding numerical simulations. In the limit L — > oo 
the critical value a c (L) tends to zero while the slope 4>(L) 
diverges to infinity what is a sign of a discontinuous tran- 
sition in the thermodynamical limit. It should be stressed 
that for a — the MET phase does not exist, what is 
shown by Eq |13l 



V. THREE-STATE SYSTEM 

A natural extension of the two-state system is to add 
one more state, i.e., e G { — 1,0,1}. To compare proper- 
ties of such systems with our previous results, we con- 
sidered a symmetrical three-state model where X-\ — 
xq = xi — 0.5 and a_i — a a = ai with a symmetrical 
two-state model where x~i = x\ = 0.5 and a-i = a\. 
Values of the inverse of critical time (A) as a function of 
the exponent a are presented in Fig. [8] Since results 
for both systems lie on the same line, we can state that 
the number of possible emotional states does not influ- 
ence a critical time needed for the emergence of MET. 
This observation can be explained as follows. The occur- 
rence of MET needs a growth of a critical cluster of any 
emotion e. The growth process is dependent only on the 
conditional probability of cluster growth (Eq|3]) that is 



insensitive to the number of possible emotional states. If 
initial probabilities p(e) of a spontaneous occurrence of 
every emotional state e are equal and clusters of every 
emotion posses the same growth parameters a e and x e 
then an average time needed for the emergence of any 
critical cluster should be independent from the number 
of possible emotional states. 

Fig. [8] shows the results for an asymmetrical three- 
state system where x_i = Xo = X\ = 0.33. We in- 
vestigated models when one or two emotional states are 
random (a_i = or/and ckq = 0) and the preferential 
process appears only for the remaining emotional state. 
We observe that for a small value of a < 0.25 all three 
considered curves collapsed. 




+ Three -state system cc^a a ( =a 0^=0 

Three - state system a^a ,=^=0: 
A Three - state system a ( =a a j=0 0L=() 



0,1 0,2 0,3 0,4 0,5 0,6 0,7 

a 



FIG. 8. (Color online) Relation between the observable (A) 
and the exponent a; black points: x = 0.5 and a_i = ot\ 
(two state-system), red diamonds: x — 0.5 and a~i = a± = 
ao (three state-system), orange, violet and green points are 
for x — 0.33 (three-state system) and different values of a; 
L = 2 x 10 6 . 
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VI. REAL- WORLD DATA 

Let us consider the behavior of the proposed model 
for parameter values corresponding to a real exchange 
of messages. For the BBC Forum, the parameters are 
a_i = 0.051, ai = 0.38, a = 0.45 (see Ref. [HI)- In 
numerical simulation, the first messages were randomly 
chosen according to values of the emotional probabilities 
p(l) = 0.16, p{-l) = 0.65, p(Q) = 0.19 calculated for this 
data set. Also the parameters xq = 0.2, x% — 0.27 and 
X-i = 0.69 were taken from the BBC Forum as condi- 
tional probabilities p(e\e). 

It follows that the average time corresponding to the 
ordering phenomenon can be estimated as (T BBC ) 
57000. This value is much larger than the average thread 
length observed in the BBC data. However since the 
BBC dataset contains in total Nbbc = 2,474, 781 com- 
ments [I]]], on average there were Mbbc = Nbbc * 
A b bc ~ 43 cases where the MET phase could appear and 
discussion participants were not able to present another 
emotion. A similar situation took place for the Digg data, 
where Xnigg = 9.9 x 10~ 6 which corresponds to {T® 1 ") sa 
101,000. Since N Digg = 1,646,153 [11], M Digg w 16. 
Both values Mbbc and Mr)i gg are much lower than the 
total numbers of the observed threads in both communi- 
ties that were correspondingly [ll[ N t B h g e ( j d = 97, 946 and 
N Dig e g ad = 129,998. Thus although there are collective 
emotional interactions in above online communities, the 
majority of discussions threads are not pinned to a given 
emotion. 



ical size. Such threads exist as a majority phase above 
a critical value of the emotional interactions exponent 
a c that value decays to zero when the discussion length 
tends to infinity. In this thermodynamical limit there 
is a discontinuous transition between a phase without 
mono-emotional threads and a phase when every thread 
is emotional ordered from a certain time moment T c . The 
value of T c is independent from the system size however 
there are discontinuous changes of T c for a > 0.3. We re- 
ceived analytical forms for values of T c , a c and a fraction 
r of the ordered threads. 

The extension of the model to a three-state dynamics 
does not change its main properties, e.g. the critical time 
T c depends in the same way on the emotional interaction 
exponent a. Applying the results of our model to the 
BBC and Digg data provides an evidence that the mono- 
emotional state could be present in a very small fraction 
of the observed discussion threads. 

Comparingour results to long memory Markov chains 
studied in 0-0] we see that the preferential cluster growth 
process described by Eq.2 leads to a phase transition only 
in the thermodynamical limit L — > oo. For finite systems 
we observe a continuous increase of the MET phase with 
the strength of interactions (see Eq. 11 and Fig 6) even 
for a — > 0. Thus our model behaves differently as com- 
pared to the iV-step Markov model 0-0] where finite size 
effects do not preclude a dynamical phase transition. On 
the other hand in the thermodynamical limit our sys- 
tem displays a first order phase transition that was not 
observed in quoted studies. 



VII. CONCLUSIONS 

We studied a specific long-memory stochastic process 
that represents a data driven binary model of emotional 
online discussions. Analytical and numerical calcula- 
tions show that in the course of time persistent mono- 
emotional threads can emerge from the clusters of a crit- 
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